Data and Factors





Kerry Back

Basics

  • Price and volume
  • Company financials
  • Analyst forecasts
  • Earnings surprises
  • Dividend announcements

Data on positions and trades

  • Corporate insiders
  • Short interest
  • Quarterly fund filings
  • Retail order flow (buy and sell)

Macroeconomic data

  • Federal Reserve Economic Data (FRED)
  • Energy Information Administration (EIA)
  • World Bank, …

Sentiment data

  • Scrape social media or buy/scrape news
  • Extract mentions of tickers
  • Use machine-learning NLP (natural language processing) to classify as positive, negative, or neutral

Other data

  • Images
    • Satellite and drone imagery
    • Warehouse truck activity, cars in parking lots, …
    • Use machine learning/AI to analyze images
  • Search engine traffic
  • Phone location, …

Monthly returns and characteristics

  • We’ll use monthly data 2000-2021.
  • Monthly returns
  • 100+ stock characteristics known at the beginning of each month from
    • past prices
    • company financials
    • analyst forecasts
    • earnings announcements

  • Mimic trading monthly
    • Form portfolio at beginning of month
    • Observe returns and changes in characteristics
    • Form new portfolio, …
  • Data is on a SQL server at CloudClusters.net
  • Variable definitions

Factor investing

  • Factors are stock characteristics such that

    • stocks that share characteristics tend to move together
    • characteristics predict average returns
  • Value, momentum, profitability, investment rate, volatility, accruals, …

  • Factor investing at BlackRock

  • Factor investing at AQR

Overview of machine learning

  • What do we want to predict?
    • Return?
    • Return minus average return?
    • Rank of return (1, 2, 3, …)?
    • Whether return will be in top, middle, bottom? Or deciles?
  • What predictors do we want to use?
  • What model do we want to use?

Some models

  • Forests
    • Random forests
    • Boosting
  • Neural networks
  • Linear regression and variants

The Value of machine learning

Gu, Kelly, and Xiu, 2020